Clustering high dimensional data using subspace and projected clustering algorithms

نویسندگان

  • Rahmat Widia Sembiring
  • Jasni Mohamad Zain
  • Abdullah Embong
چکیده

Problem statement: Clustering has a number of techniques that have been developed in statistics, pattern recognition, data mining, and other fields. Subspace clustering enumerates clusters of objects in all subspaces of a dataset. It tends to produce many over lapping clusters. Approach: Subspace clustering and projected clustering are research areas for clustering in high dimensional spaces. In this research we experiment three clustering oriented algorithms, PROCLUS, P3C and STATPC. Results: In general, PROCLUS performs better in terms of time of calculation and produced the least number of un-clustered data while STATPC outperforms PROCLUS and P3C in the accuracy of both cluster points and relevant attributes found. Conclusions/Recommendations: In this study, we analyze in detail the properties of different data clustering method.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

High-Dimensional Unsupervised Active Learning Method

In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...

متن کامل

Comparison of Subspace Projection Method with Traditional Clustering Algorithms for Clustering Electricity Consumption Data

There are many studies about using traditional clustering algorithms like K-means, SOM and Two-Step algorithms to cluster electricity consumption data for definition of representative consumption patterns or for further classification and prediction work. However, these approaches are lack of scalability with high dimensions. Nevertheless, they are widely used, because algorithms for clustering...

متن کامل

Evaluation of Monte Carlo Subspace Clustering with OpenSubspace

We present the results of a thorough evaluation of the subspace clustering algorithm SEPC using the OpenSubspace framework. We show that SEPC outperforms competing projected and subspace clustering algorithms on synthetic and some real world data sets. We also show that SEPC can be used to effectively discover clusters with overlapping objects (i.e., subspace clustering).

متن کامل

A Novel Subspace Outlier Detection Approach in High Dimensional Data Sets

Many real applications are required to detect outliers in high dimensional data sets. The major difficulty of mining outliers lies on the fact that outliers are often embedded in subspaces. No efficient methods are available in general for subspace-based outlier detection. Most existing subspacebased outlier detection methods identify outliers by searching for abnormal sparse density units in s...

متن کامل

A Framework for Evaluation and Exploration of Clustering Algorithms in Subspaces of High Dimensional Databases

In high dimensional databases, traditional full space clustering methods are known to fail due to the curse of dimensionality. Thus, in recent years, subspace clustering and projected clustering approaches were proposed for clustering in high dimensional spaces. As the area is rather young, few comparative studies on the advantages and disadvantages of the different algorithms exist. Part of th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1009.0384  شماره 

صفحات  -

تاریخ انتشار 2010